home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Magnum One
/
Magnum One (Mid-American Digital) (Disc Manufacturing).iso
/
d12
/
uasmls.arc
/
UASM.DOC
< prev
next >
Wrap
Text File
|
1986-08-12
|
26KB
|
759 lines
1
UASM.DOC
UASM (for Unassembler) consists of five files at this
time: UASM.DOC, UASM-JMP.BAS, UASM-INT.BAS, UASM-STR.BAS and
UASM-DOS.MAC, with the purpose of converting the unassembled
listing of a .COM file from DEBUG into a .ASM file which can
be modified and re-assembled with the Macro assembler.
**************************** NOTICE ****************************
USER SUPPORTED SOFTWARE (With thanks to Andrew Flugelman)
A limited license is granted to all users of these programs,
to make and distribute copies for other users subject to the
following conditions:
1. None of the notices or credits are to be bypassed,
altered, or removed.
2. The programs are not to be distributed in modified
form. (Users are encouraged to distribute MERGE
files.)
3. No fee is to be charged (or any other consideration
received) for copying or distributing the programs
without an express written agreement with White Crane
Systems.
***************************************************************
UASM - The White Crane Systems Unassembler
If you are using these program and finding them of value
please send a cash contribution to support their upkeep and
distribution. Use the UASM system of programs to unassemble
one average length .COM file, look over the results and calcu-
late how many hours this would have taken you to produce.
Multiply this by the minimum wage, contribute that amount,
and use the program free thereafter. If that's too much just
send $20. Supporters will receive free notice of enhancements
and updates.
In any case you are encouraged to copy and distribute
UASM to your friends provided you do so free of charge and
in unmodified form.
Guy C. Gordon
White Crane Systems
3194 Friar Tuck Way
Doraville, GA 30340
2
INTRODUCTION
The strategy used in this system is to capture the output
of DEBUG and run it through a series of BASIC programs, each
of which modifies one type of statement in the listing, making
it more like an .ASM source file. This keeps each program
short and fast, and allows you to look over the output at each
step to make sure no mistakes have been entered. It also makes
the programs easy to understand and improve, as new steps can
be added without interfering with the first steps. Later in
its development UAand improve, as new steps can
be added without interfering with the first steps. Later in
its development UASM will combine these steps. I hope that
users of these programs will send me their improvements so
that I may add them to future releases.
UASM-JMP takes captured unassembled code from DEBUG (which
we will name FILE.DB) and finds all addresses referenced by
the various Jump, Call, and Loop instructions. These referenced
addresses are made into labels of the form Lhhhh (where hhhh
is the hex address). A new file (FILE.JMP) is then written
in the form of assembler source code. All of the addresses
and hex opcodes in the left two columns of the DEBUG listing
are left out. Referenced lines are appropriately labeled as
Lhhhh:.
UASM-INT reads FILE.JMP and writes FILE.INT in which it
has added Macro calls and comments explaining the various Inter-
rupts. The macros, symbols, and comments are read from the
file UASM-DOS.MAC. This file contains a table of EQUates which
define the symbols for the various DOS function calls and the
DOSCALL macro. It is included in FILE.INT by means of an
INCLUDE directive.
UASM-STR reads FILE.INT and writes FILE.STR, attempting
to find all strings and variables used by FILE.COM. When it
finds an address it reads the string or variable from FILE.COM
and generates the appropriate data statement (e.g. Dhhhh DB
'string') which it appends to FILE.STR, and comments each
line of code which references that address.
From that point on, you must take over and supply the
remaining text strings and variables that are addressed. You
should heavily comment the code as you go through it and change
the labels that UASM has assigned into more meaningful names.
This is best done with the global change command in your text
editor. I also recommend using the Macro CREF program to obtain
a cross reference map of the symbols.
These programs are by no means infallible, and they can
no more read the programmers' mind than you or I, so you will
have to check the output closely. If you expect to simply
run UASM and be handed a usable source file you're going to
be disappointed. On the other hand, if you've ever tried to
3
understand a program from just a DEBUG listing you will be
pleasantly surprised. UASM will aid you in studying other
programs by doing a lot of the dirty work for you, but if you
don't study the code you won't get usable output.
I have been using these programs to unassemble DEBUG.COM
and COMMAND.COM. When I have them sufficiently commented I
will post them on the BBS's. It is my hope that UASM will
lead to a whole library of well commented, "reverse engineered"
source code for the MS-DOS operating system and utilities.
I would appreciate anyone else working on the same to upload
your results to the BBS. Suggestions and improvements to UASM
are welcome. I may be contacted through any of the IBM-PC
BBS's in Atlanta, or write:
Guy C. Gordon
White Crane Systems
3194 Friar Tuck Way
Doraville, GA 30340
4
OPERATING INSTRUCTIONS
-DEBUG-
As an example, we will unassemble a fictitious file,
FILE.COM
A>debug file.com
-r
.....CX=1780 ... ;file length in hex bytes
-d 100 l 1780 ;display entire file
In the listing that follows you should be able to spot
ASCII text and any regular binary tables. Write down the be-
ginning and ending addresses of these, as we do not want to
unassemble them, but we will want a printed copy. Our aim
is to put together a list of all blocks of code to be unassem-
bled and string addresses for UASM-STR. Look at the code before
each block of text. Usually it will be preceded by a hex C3
which is a RET instruction, but there may be a JMP, JMPS, IRET,
or RETF instead. This is the last instruction we want to unas-
semble in the block of code preceding the text. Take your
time and go through the entire file, unassembling code and
making sure that the output looks reasonable.
Reasonable code contains such things as CALL or Jump in-
structions to nearby addresses, INT 21 instructions and multiple
operations on single registers. It does not contain DB instruc-
tions or very many 00 bytes. Also the ASCII display of a sec-
tion of code will look totally random, with about 50% of it
being displayable characters. (The rest will be periods.)
Peter Norton has given a good demonstration of this in chapter
6 of "Inside the IBM-PC". One warning--the DEBUG unassembler
tends to lock into phase with the correct code, which is very
nice, but be certain that the beginning few instructions are
also in phase. Sections of code that are in phase will contain
Jumps and CALLs to other sections, thus telling you where to
start unassembling.
At the end of this investigation of the .COM file you
should have a list of the starting and ending addresses of
all the code blocks and all the string blocks. The next step
depends upon whether you have DOS 2.0 or not. It is much easier
if you have 2.0, or can to this part on a friend's machine
who has it. This is because under DOS 2.0 we can pipe the
output of DEBUG into a file thus capturing the unassembled
code for input to UASM-JMP. Under DOS version 1. we must modify
DEBUG (using DEBUG of course) to get it to write the file we
need.
5
DEBUG - 2.0 Instructions
Create a file, FILE.IN, with the following DEBUG instruc-
tions:
u addr 1 addr 2 ;addresses of blocks of
u addr 3 addr 4 ; code to unassemble
u addr 5 addr 6 ; from our initial investigation
q ;Must have Quit instruction at end
Now we can run DEBUG and pipe the output to a disk file.
DEBUG FILE.COM <FILE.IN >FILE.DB
FILE.DB is the input for UASM-JMP.
DEBUG - 1.1 Instructions
While it is quite easy to capture the output of DEBUG under
DOS 2.0 since we can pipe it to a file, under earlier versions
of DOS we have no such option. However, DEBUG is an exceptional-
ly powerful program, and already contains the code necessary
to write a disk file with the Write command. We will use this
to capture the Unassembled code.
If we unassemble and examine DEBUG, we can find the follow-
ing subroutine:
02C8:02C0 PUSH AX ;save registers
PUSH DX
AND AL,7F ;insure character is ASCII
XCHG DX,AX ;put character in DL
MOV AH,02 ;DOS Function 2 to display DL
INT 21
POP DX ;restore registers
POP AX
RET ;return
As it turns out, DEBUG does all screen output through
this subroutine. Thus we can modify just this subroutine and
capture each character as it is displayed. What we will do
with it is write it out to an unused portion of memory. From
there we can write all the output to a file using the Write
command.
6
Our subroutine to store character AL in consecutive memory
locations will be very small--about 20 bytes. We'll need some-
place to put it. For DEBUG 1.07 I chose to put it inside a
string which is only printed once--the message "DEBUG version
1.07" located at CS:0102. Here is the subroutine:
02C8:0102 DW 3300 ;pointer to memory
PUSH DI ;save index register
SEG CS ;offset from code, not ES
MOV DI,[0102] ;get pointer
SEG CS ;
STOSB ;store char in AL into memory
SEG CS ;
MOV [0102],DI ;store incremented pointer
POP DI ;restore register
XCHG DX,AX ;complete the instructions that
MOV AH,02 ; CALL to this routine replaced
RET ;Return to Display routine
We can store this subroutine over the string with the
Enter command. (here 02C8 is the base segment where DEBUG is
loaded on my system):
E 2C8:102 00 33 57 2E 8B 3E 02 01 2E AA 2E 89 3E 02 01 5F 92
B4 02 C3
We can check that this was entered correctly by Unassembling
it:
U 2C8:104 ;you should see the subroutine listed above.
The choice of memory location is up to you. 3300 Is the
value I used while unassembling DEBUG. It should be larger
than the sum of the sizes (in bytes) of DEBUG and the program
you are unassembling. To have this subroutine called each
time DEBUG writes a character, we insert a subroutine Call:
E 2C8:2C4 E8 3D FE ;Call 0104
This puts a CALL 0104 in place of XCHG DX,AX and MOV AH,02.
That is why we perform those instructions before returning
to the display routine. The very next charter printed by DEBUG
after you Enter the above command will be stored in location
2C8:3300 as well as displayed on the screen.
7
Immediately after entering the CALL instruction above
you should begin the Unassemble commands that you determined
will give you all the code for the program.
U 100 4D5
U 6b0 799
etc.
D 2C8:102 103 ;This displays the pointer to the end of text
B3 D9 ;This means we filled memory to D9B3
;(remember the 8088 stores words LSB first)
H D9B3 3300 ;Hex arithmetic
0CB3 A6B3 ; D9B3 - 3300 = A6B3
R CX
CX=1748
:A6B3 ;load CX register with # of bytes to write
N FILE.DB ;name the output file
W 2C8:3300 ;writing from 3300 off. from DEBUG base
Writing A6B3 bytes
E 2C8:102 00 33 ;reset pointer if out of space
Remember, you can only write text to memory up to 2C8:-
FFFF. If you exceed that you will write over DEBUG at 2C8:0000
and will probably have to re-boot. If FILE.COM is too big
to Unassemble in one pass you'll have to do it in pieces and
append them together with your text editor. For this reason
it is a good idea to modify and save a copy of DEBUG under
another name such as UDEBUG. If you need to perform any other
operations with a modified DEBUG that you do not want written
to memory you can restore DEBUG to normal operation with:
E 2C8:2C4 92 B4 02 ;restores XCHG DX,AX and MOV AH,02
Now text edit FILE.DB and remove any extraneous lines such
as debug prompts that might have been displayed. If there
are any TABs in FILE.DB they will confuse UASM-JMP and the
others. DEBUG 1.1 appears to put a TAB after each instruction
while version 2.0 does not. I always use the text editor to
change all TABs to the appropriate number of spaces. (Users
of PMATE, use the YF command.)
Any of the memory addresses above may vary with your operat-
ing system and DEBUG version. The values given are for the
Victor 9000, MS-DOS 1.25a, and DEBUG 1.07. The Base Segment
where DEBUG is loaded (2C8 above) will depend upon your machine
and operating system, and is found by using DEBUG to Search
for itself in memory. The display subroutine (2C0 above) de-
pends upon your DEBUG version number. The same subroutine
occurs at 2B5 in the DEBUG that comes with PC-DOS 1.10, and
will appear near these locations in any other version 1 DEBUGs.
If you store the capture subroutine at some other place in
memory you need to change the two [0102] references and the
CALL 0104 instruction.
8
UASM-JMP Instructions
Run UASM-JMP as you would any basic program. It will
prompt you for the name of input and output files. Respond
with FILE.DB ,which we created above, and B:FILE.JMP for out-
put. If file extensions are not provided, .DB and .JMP will
be assumed for input and output respectively. Also the output
file name will default to the input file name. I highly recom-
mend putting these files on separate drives if you don't have
a fixed disk or a RAM disk. This will speed up the program
and save wear on your floppies.
UASM-JMP will make two passes through the input file.
On the first pass it will build a list of all referenced lines.
It then sorts this list (shell sort), eliminates duplicate
references, and on the second pass, labels all of the referenc-
es. The output will be displayed on your screen as well as
written out on the second pass.
If the program finds a Jump or CALL to an address not
contained in the file you will get the message "WARNING! No
code for this label". This most likely means you missed the
block of code starting at address hhhh and will have to add
it to FILE.DB. The statement after an unconditional program
transfer (JMP or RET) is always labeled. The message "WARNING!
This label not referenced" means that there is no Jump or CALL
to this label. It might be an interrupt handler or, in a highly
modified program, it might be code left over from an earlier
version which is no longer executed. (NOP instructions are
not force labeled, but the following instruction is.) A large
number of these errors might indicate that they are accessed
by an address table. Both of the above errors might occur
if you miss a block of code, unassemble a data area, or the
code modifies itself.
For added readability, UASM-JMP inserts one blank line
after each JMP or JMPS instruction and three lines after a
RET or IRET. This helps separate Proceedures.
UASM-INT Instructions
To run UASM-INT you must also have the data file UASM-
DOS.MAC on one of the drives. UASM-INT will prompt you for
an input and output file names. If extensions are not provided,
.JMP and .INT will be assumed for input and output respective-
ly. The program then loads the symbol table contained in UASM-
DOS.MAC. While reading through FILE.JMP, whenever UASM-INT
encounters an INT instruction it adds a Macro call, Symbols
for the DOS function calls, and Comments, all from the UASM-
DOS.MAC file. These lines will also be displayed on the screen
as the program progresses. Note that the DOSCALL Macro is
inserted in the text, but the INT instructions are not deleted.
After you have checked the code you must delete the INT and
any MOV instructions that will be duplicated by the Macro.
9
UASM-STR Instructions
To run UASM-STR you must have the original FILE.COM or
other binary file on disk. The program will prompt you for
the input, output, and binary file names. These will default
to .INT, .STR and .COM if no other extension is given. As
usual, the input file name will be used as a default if you
do not specify the others, and you should put the output file
on a different floppy drive than the input file.
You will then be prompted for any string area addresses
that you may have found while examining FILE.COM with DEBUG.
You may enter an address range (hhhh kkkk), the address of
a single string (hhhh), or an address and a length (hhhh Ln)
on each line. (Up to 20 lines). Upon receiving a blank line
as input, the program will find all strings terminated with
a $ starting at the first address in a range and continue find-
ing multiple strings to the second address if present. If
a single address is given on a line a single string will be
read. If a length is provided, the string will be truncated
to that length or at the terminator, which ever comes first.
(This is useful for data string which do not have $ terminat-
ors.) Each string is displayed as it is found.
Following this the program reads through FILE.INT. For
each "DOSCALL PRINT$ hhhh" encountered it reads the string
from FILE.COM at the specified location (taking into account
the 100H byte program prefix) and prints that string as a com-
ment next to the Macro. Also, each time a register is loaded
with the address of a string, that string is shown next to
the code. At the end of the file, UASM-INT will append a number
of EQUates and Data statements and define the string variables
with names Dhhhh. Non-printing characters are converted into
hex bytes. CR, LF, TAB, ESC, and $ are defined as symbols.
DOSCALLs that do file I/O and that load the address of
the File Control Block into DX will generate that FCB as a
string. Any address which is used within brackets (e.g.
LEA DX,[hhhh]), that is not already a known string address,
is assumed to be the address of a variable. A data statement
is generated for the variable, and two bytes are extracted
from FILE.COM to show its initial value.
10
SAMPLE OUTPUT - Excerpts from DEBUG.STR
INCLUDE UASM-DOS.MAC
.RADIX 16
START: JMPS L011D
L011D: MOV SP,1822
MOV [1897],AL
MOV DX,0102
MOV AH,09
INT 21
DOSCALL PRINT$,D0102 ;CR,LF,'DEBUG-86 version 1.07',CR,LF,$
MOV AX,2522
MOV DX,01E6
INT 21
DOSCALL SET$INT 01E6 ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
MOV AL,23
MOV DX,01EB
INT 21
DOSCALL SET$INT 01EB ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
MOV DX,CS
ADD DX,01AB
MOV AH,26
INT 21
DOSCALL BUILD$PS 01AB ; Create new program segment (DX=SEGMENT)
MOV AX,DX
MOV DI,1832
STOSW
MOV DX,0080
MOV AH,1A
INT 21
DOSCALL SET$DTA 0080 ; Set Disk Transfer Address to DX
MOV AX,[0006]
MOV BX,AX
CMP AX,FFF0
PUSH CS
POP DS
ADD [0008],BX
MOV DI,005C
MOV SI,0081
MOV AX,2901
11
INT 21
DOSCALL PARSE$ ; Parse Filespec (SI -> LINE, DI -> FCB, AL=CODE)
CALL L0917
PUSH CS
POP ES
CMP B,[005D],20
JZ L01B5
JMPS L01B5
L01E3: JMP L04CB
L01E6: MOV DX,167A ;WARNING! This label not referenced
MOV DS,AX
MOV SS,AX
MOV SP,1822
MOV AH,09
INT 21
DOSCALL PRINT$ ; Display string @DX till terminator
JMPS L01B5
L01FD: MOV AH,0A
MOV DX,1844
INT 21
DOSCALL INSTR$ 1844 ; Input keyboard string (DX -> size,cnt,buffer)
MOV SI,1846
;END CODE
.RADIX 16
CR EQU 0D
LF EQU 0A
TAB EQU 09
ESC EQU 1B
$ EQU 24
D167A DB CR,LF,'Program terminated normally',CR,LF,$
D169A DB 'Invalid drive or file name',CR,LF,$
D16B7 DB 'File not found',CR,LF,$
D16C8 DB 'No room in disk directory',CR,LF,$
D16E4 DB 'Insufficient space on disk',CR,LF,$
D1701 DB 'Disk$'
D1706 DB 'Write protect$'
D1714 DB ' error reading drive A',CR,LF,$
D172D DB 'readwritInsufficient memory',CR,LF,$
D174B DB '^ Error',CR,8A,' ',88,'Error in EXE/HEX file',CR,LF,$
D176E DB 'EXE/HEX file cannot be written',CR,LF,$
D178F DB 'Writing $'
D1798 DB ' bytes',CR,LF,$
D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$